Timothy W Russell*, Joel Hellewell, Sam Abbott, Nick Holding, Hamish Gibbs, Christopher I Jarvis, Kevin Van Zandvoort, CMMID COVID-19 working group, Stefan Flasche, Rosalind M Eggo, W John Edmunds, Adam J Kucharski

authors contributed equally

* corresponding author

Last Updated: 2020-05-13

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 38% (27%-53%) 4,967 127
Albania 79% (39%-100%) 876 31
Algeria 35% (23%-50%) 6,067 515
Andorra 23% (13%-39%) 758 48
Argentina 21% (16%-28%) 6,550 319
Armenia 74% (50%-98%) 3,538 47
Australia 86% (52%-100%) 6,964 97
Austria 39% (25%-61%) 15,910 623
Azerbaijan 85% (59%-100%) 2,693 33
Bahamas 48% (11%-99%) 93 11
Bangladesh 96% (75%-100%) 16,660 250
Belarus 99% (93%-100%) 24,873 142
Belgium 12% (9.5%-15%) 53,779 8,761
Bolivia 20% (15%-28%) 2,964 128
Bosnia and Herzegovina 10% (5.9%-16%) 2,158 117
Brazil 13% (10%-15%) 177,589 12,400
Bulgaria 26% (18%-40%) 2,023 95
Burkina Faso 23% (14%-40%) 766 51
Cameroon 74% (22%-100%) 2,689 125
Canada 14% (11%-17%) 71,157 5,169
Chad 5.7% (3.4%-10%) 357 40
Chile 90% (75%-99%) 31,721 335
China 99% (100%-100%) 84,018 4,637
Colombia 26% (20%-33%) 12,272 493
Congo 63% (22%-100%) 333 11
Cote dIvoire 85% (57%-100%) 1,857 21
Croatia 20% (11%-32%) 2,207 91
Cuba 33% (21%-51%) 1,804 78
Cyprus 73% (38%-100%) 903 23
Czechia 35% (26%-46%) 8,221 283
Democratic Republic of the Congo 24% (12%-42%) 1,169 50
Denmark 34% (24%-46%) 10,591 527
Dominican Republic 51% (38%-65%) 10,900 402
Ecuador 5.9% (4.8%-7.2%) 30,419 2,327
Egypt 31% (23%-40%) 10,093 544
El Salvador 51% (26%-91%) 1,037 20
Estonia 34% (21%-52%) 1,746 61
Finland 38% (23%-58%) 6,003 275
France 9.5% (7.7%-11%) 140,227 26,991
Germany 24% (19%-29%) 171,306 7,634
Ghana 95% (81%-100%) 5,127 22
Greece 32% (20%-50%) 2,744 152
Guatemala 44% (25%-73%) 1,199 27
Guernsey 42% (13%-95%) 252 13
Honduras 20% (13%-32%) 2,080 121
Hungary 11% (8%-16%) 3,341 430
Iceland 87% (55%-100%) 1,801 10
India 35% (28%-42%) 74,281 2,415
Indonesia 26% (18%-35%) 14,749 1,007
Iran 34% (28%-41%) 110,767 6,733
Iraq 55% (27%-92%) 2,913 112
Ireland 28% (21%-37%) 23,242 1,488
Isle of Man 21% (7.9%-61%) 331 23
Israel 82% (63%-99%) 16,529 260
Italy 15% (12%-18%) 221,216 30,911
Japan 18% (13%-24%) 16,024 668
Jersey 14% (6.6%-33%) 295 26
Kazakhstan 97% (85%-100%) 5,417 32
Kenya 19% (11%-29%) 715 36
Kosovo 46% (26%-77%) 919 29
Kuwait 84% (61%-100%) 10,277 75
Kyrgyzstan 74% (36%-100%) 1,044 12
Latvia 53% (25%-95%) 950 18
Lebanon 55% (29%-96%) 870 26
Liberia 36% (8.7%-91%) 212 20
Lithuania 40% (24%-65%) 1,491 50
Luxembourg 51% (33%-70%) 3,894 102
Malaysia 95% (76%-100%) 6,742 109
Mali 24% (15%-38%) 730 40
Mauritius 60% (16%-100%) 332 10
Mexico 7.9% (6.5%-9.5%) 38,324 3,926
Moldova 29% (21%-38%) 5,154 182
Morocco 95% (79%-100%) 6,418 188
Netherlands 16% (13%-20%) 42,984 5,510
New Zealand 46% (22%-85%) 1,147 21
Niger 18% (7.9%-32%) 854 47
Nigeria 25% (18%-33%) 4,787 158
North Macedonia 26% (17%-42%) 1,674 92
Norway 38% (16%-67%) 8,135 228
Oman 95% (80%-100%) 3,721 17
Pakistan 43% (33%-54%) 34,336 737
Panama 49% (35%-65%) 8,783 252
Paraguay 84% (49%-100%) 737 10
Peru 42% (33%-51%) 72,059 2,057
Philippines 21% (16%-26%) 11,350 751
Poland 26% (20%-33%) 16,921 839
Portugal 40% (31%-50%) 27,913 1,163
Puerto Rico 33% (21%-49%) 2,299 114
Qatar 93% (68%-100%) 25,149 14
Romania 18% (14%-22%) 15,778 1,002
Russia 90% (78%-99%) 232,243 2,116
San Marino 75% (40%-100%) 638 41
Saudi Arabia 99% (93%-100%) 42,925 264
Serbia 87% (61%-100%) 10,243 220
Singapore 94% (74%-100%) 24,671 21
Sint Maarten 13% (4.2%-34%) 77 15
Slovakia 64% (37%-98%) 1,465 27
Slovenia 17% (11%-25%) 1,461 102
Somalia 26% (15%-46%) 1,170 52
South Africa 45% (34%-59%) 11,350 206
South Korea 42% (17%-79%) 10,962 259
Spain 16% (13%-19%) 228,030 26,920
Sudan 18% (11%-27%) 1,661 80
Sweden 15% (11%-18%) 27,272 3,313
Switzerland 25% (19%-32%) 30,297 1,560
Thailand 74% (50%-99%) 3,017 56
Tunisia 54% (25%-97%) 1,032 45
Turkey 66% (53%-79%) 141,475 3,894
Ukraine 36% (26%-50%) 16,023 425
United Arab Emirates 88% (67%-100%) 19,661 203
United Kingdom 16% (13%-19%) 226,463 32,692
United Republic of Tanzania 43% (22%-79%) 509 21
United States of America 33% (27%-39%) 1,369,964 82,387
Uruguay 42% (20%-83%) 717 19
Uzbekistan 91% (66%-100%) 2,547 10
Venezuela 82% (44%-100%) 423 10

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.